Predictors: body mass interacting with habitat.
mammal_sv <- read.csv("data/sv.csv")
log10sv is the base-10 logarithm of stroke volume in mlRename some variables and create species name from genus and species.
mammal_sv <- mammal_sv %>%
rename(source = Source,
order = order.corrected,
mass.kg = body.mass..kg.) %>%
mutate(order = str_to_title(order),
genus = str_to_title(genus),
animal = paste(genus, species),
above.10kg = ifelse(mass.kg >= 10, 'Larger', 'Smaller'))
Keep only variables we will be using. And factor() “chr” variables.
mammal_sv <- mammal_sv %>%
dplyr::select(order,
genus,
species,
common.name,
mass.kg,
log10.body.mass,
log10sv,
habitat,
source,
animal,
above.10kg
) %>%
mutate(across(where(is.character), factor)) %>%
arrange(order, genus, species)
A few quick graphs just to make sure the data are looking as we expect (error checking).
my_scatter_plot <- gf_point(log10sv ~ log10.body.mass | habitat,
color = ~order,
# size = ~parse_number(n_animals),
data = mammal_sv,
alpha = 0.5) %>%
gf_theme(legend.position = 'bottom',
legend.title = element_text(size = 8),
legend.text = element_text(size = 6)) %>%
gf_theme(scale_color_viridis_d('Order')) %>%
gf_labs(x = 'Log10(Mass (kg))',
y = 'Log10(SV (ml))')
my_scatter_plot
plotly::ggplotly(my_scatter_plot) %>%
plotly::layout(legend = list(#orientation = 'h',
font = list(size = 6)))
Notes: If we wanted to use for web or publication, we would refine. We can edit what info is shown when you hover over a data point, change color scheme, legend, etc.
Will not account for phylogeny at all in the model structure. Predictors: body mass interacting with habitat. Weight by number of individuals (if have data later, don’t have it now)?
lm_model <- lm(log10sv ~ log10.body.mass * habitat,
data = mammal_sv)
summary(lm_model)
##
## Call:
## lm(formula = log10sv ~ log10.body.mass * habitat, data = mammal_sv)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.38559 -0.09944 -0.03216 0.14260 0.36045
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.16485 0.20949 0.787 0.439
## log10.body.mass 0.99959 0.09295 10.754 7.25e-11 ***
## habitatland -0.19610 0.21600 -0.908 0.373
## log10.body.mass:habitatland 0.04781 0.09705 0.493 0.627
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1745 on 25 degrees of freedom
## Multiple R-squared: 0.9853, Adjusted R-squared: 0.9836
## F-statistic: 559.1 on 3 and 25 DF, p-value: < 2.2e-16
tab_model(lm_model)
| log 10 sv | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 0.16 | -0.27 – 0.60 | 0.439 |
| log10.body.mass | 1.00 | 0.81 – 1.19 | <0.001 |
| habitat [land] | -0.20 | -0.64 – 0.25 | 0.373 |
|
log10.body.mass * habitat [land] |
0.05 | -0.15 – 0.25 | 0.627 |
| Observations | 29 | ||
| R2 / R2 adjusted | 0.985 / 0.984 | ||
lm_anova_results <- car::Anova(lm_model)
pander(lm_anova_results)
| Sum Sq | Df | F value | Pr(>F) | |
|---|---|---|---|---|
| log10.body.mass | 46.39 | 1 | 1524 | 6.308e-24 |
| habitat | 0.04624 | 1 | 1.519 | 0.2292 |
| log10.body.mass:habitat | 0.007388 | 1 | 0.2427 | 0.6265 |
| Residuals | 0.761 | 25 | NA | NA |
lm_preds <- predict(lm_model, se.fit = TRUE)
mammal_sv <- mammal_sv %>% mutate(lm_resids = resid(lm_model),
lm_fitted = lm_preds$fit,
lm_fit_lo = lm_preds$fit + 1.96*lm_preds$se.fit,
lm_fit_hi = lm_preds$fit - 1.96*lm_preds$se.fit)
gf_point(lm_resids ~ lm_fitted, data = mammal_sv)
acf(resid(lm_model))
gf_dhistogram(~lm_resids, data = mammal_sv,
bins = 7) %>%
gf_fitdistr()
Any predictors not shown in a plot are held constant at their mean or most common value.
gf_line(lm_fitted ~ log10.body.mass,
color = ~habitat,
data = mammal_sv) %>%
gf_ribbon(lm_fit_lo + lm_fit_hi ~ log10.body.mass,
color = ~habitat, fill = ~habitat) %>%
gf_theme(scale_color_manual(values = my_colors)) %>%
gf_theme(scale_fill_manual(values = my_colors))
gf_point(log10sv ~ lm_fitted, data = mammal_sv,
alpha = 0.1) %>%
gf_labs(y = 'Observed log10(fH)',
x = 'Model-predicted log10(fH)',
title = 'Linear Regresssion (no phylogeny)') %>%
gf_abline(slope = 1, intercept = 0, color = 'black', linetype = 'dashed')
Can refine the plot of predictions later on.
Will include nested random effects of order/family/genus/species, expecting similarity of observations based on a hierarchy of phylogenetic relatedness, but not in a specified/structured way; only groupings are used, with no sense of (for example) the fact that two orders are thought to be “further” apart than any other two.
re_model <- glmmTMB(log10sv ~ log10.body.mass * habitat +
(1 | order/genus/species),
data = mammal_sv)
summary(re_model)
## Family: gaussian ( identity )
## Formula:
## log10sv ~ log10.body.mass * habitat + (1 | order/genus/species)
## Data: mammal_sv
##
## AIC BIC logLik deviance df.resid
## -9.5 1.5 12.7 -25.5 21
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## species:(genus:order) (Intercept) 1.740e-03 4.172e-02
## genus:order (Intercept) 2.360e-02 1.536e-01
## order (Intercept) 8.935e-11 9.452e-06
## Residual 1.579e-03 3.973e-02
## Number of obs: 29, groups:
## species:(genus:order), 29; genus:order, 27; order, 9
##
## Dispersion estimate for gaussian family (sigma^2): 0.00158
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.16485 0.19699 0.837 0.403
## log10.body.mass 0.99959 0.08741 11.436 <2e-16 ***
## habitatland -0.17116 0.20442 -0.837 0.402
## log10.body.mass:habitatland 0.03755 0.09173 0.409 0.682
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tab_model(re_model)
| log 10 sv | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 0.16 | -0.22 – 0.55 | 0.403 |
| log10.body.mass | 1.00 | 0.83 – 1.17 | <0.001 |
| habitat [land] | -0.17 | -0.57 – 0.23 | 0.402 |
|
log10.body.mass * habitat [land] |
0.04 | -0.14 – 0.22 | 0.682 |
| Random Effects | |||
| σ2 | 0.00 | ||
| τ00 species:(genus:order) | 0.00 | ||
| τ00 genus:order | 0.02 | ||
| τ00 order | 0.00 | ||
| N species | 28 | ||
| N genus | 27 | ||
| N order | 9 | ||
| Observations | 29 | ||
| Marginal R2 / Conditional R2 | 0.999 / NA | ||
re_anova_results <- car::Anova(re_model)
pander(re_anova_results)
| Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|
| log10.body.mass | 1520 | 1 | 0 |
| habitat | 1.576 | 1 | 0.2093 |
| log10.body.mass:habitat | 0.1676 | 1 | 0.6823 |
re_ave_preds <- predict(re_model,
se.fit = TRUE,
re.form = ~0)
re_ind_preds <- predict(re_model,
se.fit = TRUE,
re.form = NULL)
mammal_sv <- mammal_sv %>%
mutate(re_resids = resid(re_model),
re_ind_fitted = re_ind_preds$fit,
re_ave_fitted = re_ave_preds$fit,
re_ave_lo = re_ave_preds$fit - 1.96*re_ave_preds$se.fit,
re_ave_hi = re_ave_preds$fit + 1.96*re_ave_preds$se.fit)
gf_point(re_resids ~ re_ind_fitted, data = mammal_sv)
# save fitted model and data
saveRDS(mammal_sv, 'fitted-models/sv-data.RDS')
saveRDS(re_model, 'fitted-models/sv-re-model.RDS')
acf(resid(re_model))
acf(resid(re_model))
gf_dhistogram(~re_resids, data = mammal_sv,
bins = 7) %>%
gf_fitdistr()
gf_line(re_ave_fitted ~ log10.body.mass,
color = ~habitat,
data = mammal_sv) %>%
gf_ribbon(re_ave_lo + re_ave_hi ~ log10.body.mass,
color = ~habitat, fill = ~habitat) %>%
gf_theme(scale_color_manual(values = my_colors)) %>%
gf_theme(scale_fill_manual(values = my_colors))
gf_point(log10sv ~ re_ind_fitted, data = mammal_sv,
alpha = 0.1) %>%
gf_labs(y = 'Observed log10(sv)',
x = 'Model-predicted, Species-specific log10(sv)',
title = 'Mixed-effects Model (RE of Order/Genus/Species)') %>%
gf_abline(slope = 1, intercept = 0, color = 'black', linetype = 'dashed')
The graph above is a bit “cheating” as we have a random effect of species, but there are only 1-2 measurements for most of the species (nearly guaranteeing that our estimates will be nearly perfect). What if we also check out the predictions accounting for the modeled effects of Order and Genus, but predicting for the “average” species in each Genus?
gf_line(re_ave_fitted ~ log10.body.mass,
color = ~habitat,
data = mammal_sv) %>%
gf_ribbon(re_ave_lo + re_ave_hi ~ log10.body.mass,
color = ~habitat, fill = ~habitat) %>%
gf_theme(scale_color_manual(values = my_colors)) %>%
gf_theme(scale_fill_manual(values = my_colors))
gf_point(log10sv ~ re_ind_fitted, data = mammal_sv,
alpha = 0.1) %>%
gf_labs(y = 'Observed log10(fH)',
x = 'Model-predicted, Order-specific log10(fH)',
title = 'Mixed-effects Model (RE of Order/Genus)') %>%
gf_abline(slope = 1, intercept = 0, color = 'black', linetype = 'dashed')
#Read in the trees from Upham et al
tree_path <- paste("data/upham-trees/Completed_5911sp_topoCons_FBDasZhouEtAl")
tree_files <- list.files(tree_path)
all_trees <- list()
for (i in 1:length(tree_files)){
all_trees[[i]] <- read.tree(paste0(tree_path, '/',
tree_files[i]))
if (i == 1){
treeset <- all_trees[[i]]
}else{
treeset <- c(treeset, all_trees[[i]])
}
}
all_tip_labels <- purrr::map(treeset, "tip.label")
all_tip_labels <- purrr::map(all_tip_labels,
function(x)
stringr::str_replace_all(x, pattern = '_',
replacement = ' '))
# get list of species that are in ALL the trees
for (t in 1:length(all_tip_labels)){
if (t == 1){
tip_labs <- all_tip_labels[[t]]
}else{
tip_labs <- intersect(tip_labs, all_tip_labels[[t]])
}
}
# keep only the species in mammal_sv that are in all the trees
# on 6/17 this removes NO species.
pgls_data <- mammal_sv %>%
mutate(animal = as.character(animal)) %>%
filter(animal %in% tip_labs) %>%
droplevels()
taxonomy <- read_csv('data/upham-trees/taxonomy_mamPhy_5911species.csv')
Fit models, one model for every tree in our list.
pgls_models <- list()
for (t in c(1:length(treeset))){
# make sure there is only one row of data per species
pgls_rep_data <- pgls_data %>%
group_by(animal) %>%
sample_n(1) %>%
ungroup
#Reduce the tree to only include those species in the data set
refit_tree <- treeset[[t]]
refit_tree$tip.label <- str_replace_all(refit_tree$tip.label, '_', ' ')
refit_tree <- drop.tip(refit_tree,
setdiff(refit_tree$tip.label,
unique(pgls_rep_data %>% pull(animal))))
#Order the data set so that it is in the same order as the tip labels of the tree
pgls_rep_data <- left_join(data.frame(tree.tip.label = refit_tree$tip.label),
pgls_rep_data,
by = c('tree.tip.label' = 'animal'),
keep = TRUE)
# fit the model
pgls_models[[t]] <- tryCatch({
fittd <- gls(log10sv ~ log10.body.mass * habitat,
correlation = corPagel(value = 0.8,
phy = refit_tree,
fixed = FALSE,
form = ~animal),
data = pgls_rep_data)
},
error = function(cond){
message(paste('PGLS fit failed for tree', t))
return(NULL)
}
)
}
pgls_models <- pgls_models[!sapply(pgls_models, is.null)]
Note: we tried to fit 101 PGLS models (each with a different tree); of these, model fitting failed for 4.
Combine the many PGLS model runs together into one summary combined model according to Rubin’s rule.
# as.mira takes the list of models and create an object to be used by the mice package
pgls_mira <- as.mira(pgls_models)
# # pool summarise the models using Rubin's rule corrected for small samples
pooled_pgls <- pool(pgls_mira)
pooled_pgls_summ <- summary(pooled_pgls, type = 'all', conf.int = TRUE)
pander(pooled_pgls_summ %>% dplyr::select(term, estimate, std.error, `2.5 %`, `97.5 %`, lambda, fmi))
| term | estimate | std.error | 2.5 % | 97.5 % |
|---|---|---|---|---|
| (Intercept) | 0.2041 | 0.21 | -0.2354 | 0.6436 |
| log10.body.mass | 0.9929 | 0.09319 | 0.7988 | 1.187 |
| habitatland | -0.2378 | 0.2206 | -0.6989 | 0.2234 |
| log10.body.mass:habitatland | 0.05945 | 0.09692 | -0.1422 | 0.2611 |
| lambda | fmi |
|---|---|
| 0.1758 | 0.2507 |
| 0.1154 | 0.1908 |
| 0.1604 | 0.2354 |
| 0.09824 | 0.1737 |
# this depends on inner workings of package car found in gls-utils.R
# it used to work without all this, in summer 2021 stopped
pgls_anova <- Anova(pgls_mira)
pander(pgls_anova)
| F | num df | den df | missing info | riv | |
|---|---|---|---|---|---|
| log10.body.mass | 1419 | 1 | 6153 | 0.1201 | 0.1365 |
| habitat | 1.953 | 1 | 2290 | 0.1972 | 0.2457 |
| log10.body.mass:habitat | 0.3762 | 1 | 9181 | 0.09824 | 0.1089 |
| Pr(>F) | |
|---|---|
| log10.body.mass | 1.219e-279 |
| habitat | 0.1624 |
| log10.body.mass:habitat | 0.5397 |
Alternative approach: using MuMIn to do model averaging. Was it right to weight all of the models/trees equally?
pgls_avg <- model.avg(pgls_models,
rank = function(x) 1)
This gives a model with the same coefficients (same model) that we can better make predict()ions from.
For the PGLS model, we also want to extract the estimate of \(\Lambda\), which tells us about how the phylogeny is affecting the correlation structure.
Simple approach: take the mean of the estimates of \(\Lambda\) from all the individual fitted models.
lambda <- mean(unlist(purrr::map(pgls_models, function(x) x$modelStruct$corStruct)))
lambda
## [1] -0.110112
It’s not really clear how to even approach this, since we don’t really any longer expect the residuals to be normal or independent. And based on the data we know there’s not a huge issue with linearity. So…ok?